Out-of-Order Instruction Fetch Using Multiple Sequencers

نویسندگان

  • Paramjit S. Oberoi
  • Gurindar S. Sohi
چکیده

Conventional instruction fetch mechanisms fetch contiguous blocks of instructions in each cycle. They are difficult to scale since taken branches make it hard to increase the size of these blocks beyond eight instructions. Trace caches have been proposed as a solution to this problem, but they use cache space inefficiently. We show that fetching large blocks of contiguous instructions, or wide fetch, is inefficient for modern out-oforder processors. Instead of the usual approach of fetching large blocks of instructions from a single point in the program, we propose a high-bandwidth fetch mechanism that fetches small blocks of instructions from multiple points in a program. In this paper, we demonstrate that it is possible to achieve high-bandwidth fetch by using multiple narrow fetch units operating in parallel. Our mechanism performs as well as a trace cache, does not waste cache space, is more resilient to instruction cache misses, and is a natural fit for techniques that require fetching multiple threads, like multithreading, dual-path execution, and speculative threads.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Exploration Of Instruction Fetch Requirement In Out-of-order Superscalar Processors

Automated design of superscalar processors can provide future in terms a cycles-per-instruction (CPI) using the application program statistics and the 124, Optimization of Instruction Fetch Mechanisms for High Issue Rates 117, A first-order superscalar processor model Karkhanis, Smith 2004 (Show Context). Because superscalar architectures include complicated control logic for out-of-order execu...

متن کامل

HydraScalar: A Multipath-Capable Simulator

Even sophisticated branch-prediction techniques necessarily suffer some mispredictions, and even relatively small mispredict rates hurt performance substantially in current-generation processors. This suggests the study of multipath execution, in which the processor simultaneously executes code from both the taken and not-taken outcomes of a branch. This paper describes HydraScalar, a simulator...

متن کامل

Reducing the Performance Impact of Instruction Cache Misses

Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works, must be obtained from the Abstract In conventional processors, each instruction cache fetch brings in a ...

متن کامل

Performance Study of a Multithreaded Superscalar Microprocessor

This paper describes a technique for improving the performance of a superscalar processor through mul-tithreading. The technique exploits the instruction-level parallelism available both inside each individual stream, and across streams. The former is exploited through out-of-order execution of instructions within a stream, and the latter through execution of instructions from diierent streams ...

متن کامل

Design of Trace Caches for High Bandwidth Instruction Fetching

In modern high performance microprocessors, there has been a trend toward increased superscalarity and deeper speculation to extract instruction level parallelism. As issue rates rise, more aggressive instruction fetch mechanisms are needed to be able to fetch multiple basic blocks in a given cycle. One such fetch mechanism that shows a great deal of promise is the trace cache, originally propo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002